camera pose
Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation
Holistic 3D indoor scene understanding refers to jointly recovering the i) object bounding boxes, ii) room layout, and iii) camera pose, all in 3D. The existing methods either are ineffective or only tackle the problem partially. In this paper, we propose an end-to-end model that simultaneously solves all three tasks in real-time given only a single RGB image. The essence of the proposed method is to improve the prediction by i) parametrizing the targets (e.g., 3D boxes) instead of directly estimating the targets, and ii) cooperative training across different modules in contrast to training these modules individually. Specifically, we parametrize the 3D object bounding boxes by the predictions from several modules, i.e., 3D camera pose and object attributes. The proposed method provides two major advantages: i) The parametrization helps maintain the consistency between the 2D image and the 3D world, thus largely reducing the prediction variances in 3D coordinates.
Unsupervised Learning of Shape and Pose with Differentiable Point Clouds
We address the problem of learning accurate 3D shape and camera pose from a collection of unlabeled category-specific images. We train a convolutional network to predict both the shape and the pose from a single image by minimizing the reprojection error: given several views of an object, the projections of the predicted shapes to the predicted camera poses should match the provided views. To deal with pose ambiguity, we introduce an ensemble of pose predictors which we then distill to a single student model. To allow for efficient learning of high-fidelity shapes, we represent the shapes by point clouds and devise a formulation allowing for differentiable projection of these. Our experiments show that the distilled ensemble of pose predictors learns to estimate the pose accurately, while the point cloud representation allows to predict detailed shape models.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
NA VI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations
Recent advances in neural reconstruction enable high-quality 3D object reconstruction from casually captured image collections. Current techniques mostly analyze their progress on relatively simple image collections where Structure-from-Motion (SfM) techniques can provide ground-truth (GT) camera poses.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Asia > Middle East > Israel (0.04)
- (4 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Greece > Ionian Islands > Corfu (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Asia > Middle East > Israel (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)